Making sure summarization leverages KV cache #152

eddierichter-amd · 2026-01-08T20:52:00Z

The initial implementation of summarization used the .generate() method in the ChatSDK which bypassed sending previously stored messages to the LLM. It therefore needed to send the entire prompt to the agent which was formatted slightly differently and bypassed the KV cache and had to be processed. This resulted in very high TTFT for summarization. This change reduces the TTFT of summarization to be similar to other requests.

kovtcharov-amd · 2026-01-12T23:03:04Z

@claude review this PR.

github-actions · 2026-01-12T23:03:54Z

Claude finished @kovtcharov-amd's task —— View job

PR Review: KV Cache for Summarization

Read the PR diff and understand the changes
Review code quality and correctness
Check for potential issues
Provide review feedback

Overall Assessment: 🟢 Looks good with minor suggestions

The PR correctly addresses the TTFT issue by switching from llm_client.generate() to send(), which reuses the existing chat history formatting path and enables KV cache utilization.

What's Good

KV cache optimization - Using send() with existing history allows the LLM to leverage cached KV pairs from previous turns
History preservation on failure - The backup/restore pattern (original_history) properly handles error cases
Consistent formatting - History formatting now follows the same path as normal chat turns

Issues Found

🟡 History pollution - The send() call at line 831 will append BOTH the summary prompt as a user message AND the summary response to chat_history (see sdk.py:362 and sdk.py:391). Then the code clears history and adds just the summary at line 850-853. This works but creates an intermediate state where the summarization messages briefly exist in history.

Suggestion: Consider using no_history=True parameter in the send() call to avoid polluting history, then manually manage what gets stored:

chat_response = self.send(
    summary_prompt,
    no_history=True,  # Don't add to history
    max_tokens=min(self.config.max_tokens, 2048),
    timeout=1200,
)

This would be cleaner since you're clearing and replacing history anyway. However, using no_history=True would bypass the existing history context being sent to the LLM - so the current approach may actually be intentional to ensure the model sees the conversation when summarizing. Please confirm this is the intended behavior.

🟢 Minor - Line 843: The if chat_response check is good defensive coding, but send() raises on error rather than returning None, so the empty string fallback may never trigger. Not a problem, just redundant.

Verification

The logic flow is:

Backup history → 2. Call send() (adds prompt+response to history) → 3. Clear history → 4. Add only summary

This correctly leaves only the summary in history after completion.

Approve with minor suggestion - The code is functional and achieves the stated goal. The no_history=True suggestion is optional depending on whether you want the LLM to see prior context during summarization.

eddierichter-amd · 2026-01-13T03:41:45Z

@claude your guess at the end is correct, if we set no_history=True we would not have the history and we wouldn't be able to summarize the context. Are their any other Medium priority issues with this PR?

github-actions · 2026-01-13T03:42:07Z

Claude finished @eddierichter-amd's task —— View job

Re-reviewing PR for Medium Priority Issues

Read the PR diff
Check for any medium priority concerns
Provide findings

Thanks for confirming the no_history design decision! That makes sense.

Additional Findings:

After re-examining the PR, I don't see any other medium priority issues. The changes are clean and correct. Here's my assessment of the remaining minor items:

🟢 Minor (already mentioned): Line 843 has a defensive check if chat_response else "" that's technically redundant since send() raises exceptions rather than returning None. However, it's harmless defensive coding and doesn't hurt anything.

🟢 Minor: The prompt at line 823-824 now says "You have full access to the prior conversation history above" - this wording is clear and appropriate for the new flow.

No other issues found. The implementation correctly:

Backs up history before the operation
Restores history on failure paths
Clears and replaces with summary on success
Uses appropriate timeout and max_tokens constraints

The PR is good to merge from a code quality perspective.

eddierichter-amd · 2026-01-13T03:44:07Z

@kovtcharov-amd this has passed Claude's review. Let me know what you think. This is a really nice quality of life improvement as it reduces the time of summarization from over a minute to a couple seconds given.

eddierichter-amd self-assigned this Jan 8, 2026

eddierichter-amd requested review from itomek-amd and kovtcharov-amd January 9, 2026 16:53

Fixing summarization to use context caching

7a33fa9

eddierichter-amd force-pushed the eddie/summarization-kv-cache-eddie-fork branch from 4e2b5ae to 7a33fa9 Compare January 9, 2026 19:14

Merge branch 'main' into eddie/summarization-kv-cache-eddie-fork

f1d8c78

github-actions bot added the chat Chat SDK changes label Jan 12, 2026

kovtcharov-amd and others added 4 commits January 13, 2026 17:32

Merge branch 'main' into eddie/summarization-kv-cache-eddie-fork

693c2dc

Merge branch 'main' into eddie/summarization-kv-cache-eddie-fork

fe64c13

Merge branch 'main' into eddie/summarization-kv-cache-eddie-fork

1b095af

Merge branch 'main' into eddie/summarization-kv-cache-eddie-fork

450f172

kovtcharov-amd approved these changes Jan 14, 2026

View reviewed changes

eddierichter-amd added this pull request to the merge queue Jan 14, 2026

Merged via the queue into amd:main with commit 5783754 Jan 14, 2026
64 of 70 checks passed

eddierichter-amd deleted the eddie/summarization-kv-cache-eddie-fork branch January 14, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making sure summarization leverages KV cache #152

Making sure summarization leverages KV cache #152

Uh oh!

eddierichter-amd commented Jan 8, 2026

Uh oh!

kovtcharov-amd commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

eddierichter-amd commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

eddierichter-amd commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Making sure summarization leverages KV cache #152

Making sure summarization leverages KV cache #152

Uh oh!

Conversation

eddierichter-amd commented Jan 8, 2026

Uh oh!

kovtcharov-amd commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: KV Cache for Summarization

What's Good

Issues Found

Verification

Uh oh!

eddierichter-amd commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-reviewing PR for Medium Priority Issues

Uh oh!

eddierichter-amd commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 12, 2026 •

edited

Loading

github-actions bot commented Jan 13, 2026 •

edited

Loading